Homology search for genes

نویسندگان

  • Xuefeng Cui
  • Tomás Vinar
  • Brona Brejová
  • Dennis Shasha
  • Ming Li
چکیده

MOTIVATION Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. RESULTS We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene. AVAILABILITY The Java implementation is available for download from http://www.bioinformatics.uwaterloo.ca/software.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

Expression analyses of endoglucanase gene in Penicillium oxalicum and Trichoderma viride

The expression of endoglucanase gene and protein profile belonging to two fungal species, Penicillium oxalicum 1SMS and Trichoderma viride 156MS with high cellulase enzyme activity, was investigated. Fungal isolates were cultured on inducer CMC medium and then the amount of released sugar and protein were assayed every three days for a month, using arsenate molybdatereagent and Bradford method,...

متن کامل

Homology Search for Genes Using Biased HMMs

Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone...

متن کامل

Gene Prediction by Pattern Recognition and Homology Search

This paper presents an algorithm for combining pattern recognition-based exon prediction and database homology search in gene model construction. The goal is to use homologous genes or partial genes existing in the database as reference models while constructing (multiple) gene models from exon candidates predicted by pattern recognition methods. A unified framework for gene modeling is used fo...

متن کامل

Maximum Likelihood Estimation of Weight Matrices for Targeted Homology Search

Genome annotation relies to a large extent on the recognition of homologs to already known genes. The starting point for such protocols is a collection of known sequences from one or more species, from which a model is constructed – either automatically or manually – that encodes the defining features of a single gene or a gene family. The quality of these models eventually determines the succe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 23 13  شماره 

صفحات  -

تاریخ انتشار 2007